Fractional Max-Pooling

نویسنده

  • Benjamin Graham
چکیده

Convolutional networks almost always incorporate some form of spatial pooling, and very often it is α × α max-pooling with α = 2. Max-pooling act on the hidden layers of the network, reducing their size by an integer multiplicative factor α. The amazing by product of discarding 75% of your data is that you build into the network a degree of invariance with respect to translations and elastic distortions. However, if you simply alternate convolutional layers with max-pooling layers, performance is limited due to the rapid reduction in spatial size, and the disjoint nature of the pooling regions. We have formulated a fractional version of max-pooling where α is allowed to take non-integer values. Our version of maxpooling is stochastic as there are lots of different ways of constructing suitable pooling regions. We find that our form of fractional max-pooling reduces overfitting on a variety of datasets: for instance, we improve on the state of the art for CIFAR-100 without even using dropout. 1 CONVOLUTIONAL NEURAL NETWORKS Convolutional networks are used to solve image recognition problems. They can be built by combining two types of layers: • Layers of convolutional filters. • Some form of spatial pooling, such as max-pooling. Research focused on improving the convolutional layers has lead to a wealth of techniques such as dropout (Srivastava et al., 2014), DropConnect (Wan et al., 2013), deep networks with many small filters(Ciresan et al., 2012), large input layer filters for detecting texture (Krizhevsky et al., 2012), and deeply supervised networks (Lee et al., 2014). By comparison, the humble pooling operation has been slightly neglected. For a long time 2 × 2 max-pooling (MP2 has been the default choice for building convolutional networks. There are many reasons for the popularity of MP2-pooling: it is fast, it quickly reduces the size of the hidden layers, and it encodes a degree of invariance with respect to translations and elastic distortions. However, the disjoint nature of the pooling regions can limit generalization. Additionally, as MP2-pooling reduces the size of the hidden layers so quickly, stacks of back-to-back convolutional layers are needed to build really deep networks (Lin et al., 2014; Simonyan & Zisserman, 2014; Szegedy et al., 2014). Two methods that have been proposed to overcome this problems are: • Using 3× 3 pooling regions overlapping with stride 2 (Krizhevsky et al., 2012). • Stochastic pooling, where the act of picking the maximum value in each pooling region is replaced by a form of size-biased sampling (Zeiler & Fergus). However, both these techniques still reduce the size of the hidden layers by a factor of two. It seems natural to ask if spatial-pooling can usefully be applied in a gentler manner. If pooling was to only reduce the size of the hidden layers by a factor of √ 2, then we could use twice as many layers of pooling. Each layer of pooling is an opportunity to view the input image at a different scale. Viewing images at the ‘right’ scale should make it easier to recognize the tell-tale features that identify an object as belonging to a particular class.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Max-Pooling Dropout for Regularization of Convolutional Neural Networks

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in pooling layers is still not clear. This paper demonstrates that max-pooling dropout is equivalent to randomly picking activation based on a multinomial distribution at training time. In light of this insight, we advoc...

متن کامل

Towards dropout training for convolutional neural networks

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper demonstrates that max-pooling dropout is equivalent to randomly picking activation based on a multinomial distribution at training time. In light of this...

متن کامل

The Enhanced Hybrid MobileNet

Although complicated and deep neural network models can achieve high accuracy of image recognition, they require huge amount of computations and parameters and are not suitable for mobile and embedded devices. As a result, MobileNet [1] was proposed, which can reduce the amount of parameters and computational cost dramatically. MobileNet [1] is based on depthwise separable convolution and has t...

متن کامل

Emergence of Selective Invariance in Hierarchical Feed Forward Networks

Many theories have emerged which investigate how invariance is generated in hierarchical networks through simple schemes such as max and mean pooling. The restriction to max/mean pooling in theoretical and empirical studies has diverted attention away from a more general way of generating invariance to nuisance transformations. In this exploratory study, we study the conjecture that hierarchica...

متن کامل

Supplementary Material for Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

This section provides additional details for some of the ablation studies reported in Section 6. Pooling type. In the context of binary networks, and because the output is restricted to 1 and -1, max-pooling might result in outputs full of 1s only. To limit this effect, we placed the activation function before the convolutional layers as proposed in [5, 9]. Additionally, we opted to replace max...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1412.6071  شماره 

صفحات  -

تاریخ انتشار 2014